Remove redundant overflow checks
Today I’m going to review pull_8757, My selection criteria it’s the PR should be
- Neither too new or too old
- Passed most of the tests and been labeled awaiting review
- I can review it in a day.
(Another python contributor Tal Einat gave me so many advises in my yesterday post, Thank you so much, Tal.)
Step 1: discussion
I Searched the issue discussion
Max size of list and tuples is limited by PY_SSIZE_T_MAX / sizeof(PyObject*), so the sum of any two list/tuples sizes always <= PY_SSIZE_T_MAX if sizeof(PyObject*) > 1, which seems true for all supported (existing?) platforms.
It means that overflow checks in app1, ins1, list_concat and tupleconcat are redundant and can be safely removed.
It looks like we overflow checks in app1, ins1, list_concat function, Tim Peters already reviewed some of the come so now is my turn.
Step 2: commit messages and files changed
commit messages:
Remove redundant overflow checks in list and tuple implementation
files changed:
I’mo familiar with C, it’s kinda uncomfortable to review code I’m not familiar with. However, this is also a good way to learn c in a real project, right? let’s dive into the code.
Step 3: dive into the code
Search context
The changed locate at function ins1 which lack of comments.
ins1(PyListObject *self, Py_ssize_t where, PyObject *v)
{
Py_ssize_t i, n = Py_SIZE(self);
...
/* Check the list is full or not
if (n == PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"cannot add more objects to list");
return -1;
}
}
After I search the call stack, this method is used to insert an item in a list. For example:
lst = []
# We will call function ins1 when inserting value in a list
lst.insert(10, 0)
BTW, what is PY_SSIZE_T_MAX actually, I found sys.maxsize in the document
An integer giving the maximum value a variable of type Py_ssize_t can take. It’s usually 2**31 - 1 on a 32-bit platform and 2**63 - 1 on a 64-bit platform.
This PR said the max number of a list should be PY_SSIZE_T_MAX / sizeof(PyObject*) instead of PY_SSIZE_T_MAX so we overcheck in the code.
Step 4: Prove speculation
W can find The maxium number in the list in list_resize
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
if (new_allocated > (size_t)PY_SSIZE_T_MAX / sizeof(PyObject *)) {
PyErr_NoMemory();
return -1;
}
And According to PEP353
A new type Py_ssize_t is introduced, which has the same size as the compiler’s size_t type, but is signed. It will be a typedef for ssize_t where available.
I don’t know why we use siez_t here, so I asked at python.zulipchat, Ammar Askar gave me a great answer. So, after that, I left some comments on the PR, I suggested to keep the error messages add some edit.
Conclusion
I’m not afraid of C that much, I reviewed the C syntax and I found it’s much easier than I learned before, I also found another challenge for myself, find the difference of sys.max, PY_SSIZE_T_MAX and SIZE_MAX.